STCOR-895 wait a loooong time for a "stale" rotation request #1548

zburke · 2024-10-15T14:18:02Z

As part of the RTR lifecycle, we write a rotation timestamp to local storage when the process starts and then remove it when it ends. This is a cheap way of making the rotation request visible across tabs, because all tabs read the same shared storage.

To avoid the problem of a cancelled request leaving cruft in storage, we inspect that timestamp and consider a request "stale" if it's too old. That was the problem here: our "too old" timeout was too short; on a busy server, or on a slow connection, or on a client far from its host (say, in New Zealand), two seconds was not long enough. The rotation request would still be active when stripes considered it "stale", allowing a second request to go through. But since the first request was just slow, not dead, the second one is treated as a token-replay attack by the backend, causing all active sessions for that user account to be immediately terminated.

Thus, waiting longer is a quick fix. A more detailed approach to tracking the request is detailed in the code-comments attached to #1547.

Refs STCOR-895

As part of the RTR lifecyle, we write a rotation timestamp to local storage when the process starts and then remove it when it ends. This is a cheap way of making the rotation request visible across tabs, because all tabs read the same shared storage. To avoid the problem of a cancelled request leaving cruft in storage, we inspect that timestamp and consider a request "stale" if it's too old. That was the problem here: our "too old" timeout was too short; on a busy server, or on a slow connection, or on a client far from its host (say, in New Zealand), two seconds was not long enough. The rotation request would still be active when stripes considered it "stale", allowing a second request to go through. But since the first request was just slow, not dead, the second one is treated as a token-replay attack by the backend, causing all active sessions for that user account to be immediately terminated. Thus, waiting longer is a quick fix. A more detailed approach to tracking the request is detailed in the code-comments attached to #1547. Refs STCOR-895

github-actions · 2024-10-15T14:26:43Z

Jest Unit Test Statistics

178 tests ±0 178 ✔️ ±0 34s ⏱️ -1s
  25 suites ±0     0 💤 ±0
    1 files ±0     0 ❌ ±0

Results for commit f3bb165. ± Comparison against base commit 1e6fc79.

github-actions · 2024-10-15T14:27:05Z

BigTest Unit Test Statistics

    1 files ±0     1 suites ±0 10s ⏱️ ±0s
267 tests ±0 261 ✔️ ±0 6 💤 ±0 0 ❌ ±0
270 runs ±0 264 ✔️ ±0 6 💤 ±0 0 ❌ ±0

Results for commit f3bb165. ± Comparison against base commit 1e6fc79.

sonarqubecloud · 2024-10-15T14:27:59Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
100.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

As part of the RTR lifecycle, we write a rotation timestamp to local storage when the process starts and then remove it when it ends. This is a cheap way of making the rotation request visible across tabs, because all tabs read the same shared storage. To avoid the problem of a cancelled request leaving cruft in storage, we inspect that timestamp and consider a request "stale" if it's too old. That was the problem here: our "too old" timeout was too short; on a busy server, or on a slow connection, or on a client far from its host (say, in New Zealand), two seconds was not long enough. The rotation request would still be active when stripes considered it "stale", allowing a second request to go through. But since the first request was just slow, not dead, the second one is treated as a token-replay attack by the backend, causing all active sessions for that user account to be immediately terminated. Thus, waiting longer is a quick fix. A more detailed approach to tracking the request is detailed in the code-comments attached to #1547. Refs STCOR-895 (cherry picked from commit b2083cc)

zburke added 2 commits October 15, 2024 10:12

Merge branch 'b10.1' into STCOR-895-q

f3bb165

This comment has been minimized.

Sign in to view

zburke merged commit b2083cc into b10.1 Oct 21, 2024
6 checks passed

zburke deleted the STCOR-895-q branch October 21, 2024 19:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

STCOR-895 wait a loooong time for a "stale" rotation request #1548

STCOR-895 wait a loooong time for a "stale" rotation request #1548

zburke commented Oct 15, 2024

This comment has been minimized.

github-actions bot commented Oct 15, 2024

This comment has been minimized.

github-actions bot commented Oct 15, 2024

sonarqubecloud bot commented Oct 15, 2024

STCOR-895 wait a loooong time for a "stale" rotation request #1548

STCOR-895 wait a loooong time for a "stale" rotation request #1548

Conversation

zburke commented Oct 15, 2024

This comment has been minimized.

github-actions bot commented Oct 15, 2024

Jest Unit Test Statistics

This comment has been minimized.

github-actions bot commented Oct 15, 2024

BigTest Unit Test Statistics

sonarqubecloud bot commented Oct 15, 2024

Quality Gate passed